27 research outputs found

    Second-generation PLINK: rising to the challenge of larger and richer datasets

    Get PDF
    PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.Comment: 2 figures, 1 additional fil

    Vaccine breakthrough hypoxemic COVID-19 pneumonia in patients with auto-Abs neutralizing type I IFNs

    Full text link
    Life-threatening `breakthrough' cases of critical COVID-19 are attributed to poor or waning antibody response to the SARS- CoV-2 vaccine in individuals already at risk. Pre-existing autoantibodies (auto-Abs) neutralizing type I IFNs underlie at least 15% of critical COVID-19 pneumonia cases in unvaccinated individuals; however, their contribution to hypoxemic breakthrough cases in vaccinated people remains unknown. Here, we studied a cohort of 48 individuals ( age 20-86 years) who received 2 doses of an mRNA vaccine and developed a breakthrough infection with hypoxemic COVID-19 pneumonia 2 weeks to 4 months later. Antibody levels to the vaccine, neutralization of the virus, and auto- Abs to type I IFNs were measured in the plasma. Forty-two individuals had no known deficiency of B cell immunity and a normal antibody response to the vaccine. Among them, ten (24%) had auto-Abs neutralizing type I IFNs (aged 43-86 years). Eight of these ten patients had auto-Abs neutralizing both IFN-a2 and IFN-., while two neutralized IFN-omega only. No patient neutralized IFN-ss. Seven neutralized 10 ng/mL of type I IFNs, and three 100 pg/mL only. Seven patients neutralized SARS-CoV-2 D614G and the Delta variant (B.1.617.2) efficiently, while one patient neutralized Delta slightly less efficiently. Two of the three patients neutralizing only 100 pg/mL of type I IFNs neutralized both D61G and Delta less efficiently. Despite two mRNA vaccine inoculations and the presence of circulating antibodies capable of neutralizing SARS-CoV-2, auto-Abs neutralizing type I IFNs may underlie a significant proportion of hypoxemic COVID-19 pneumonia cases, highlighting the importance of this particularly vulnerable population

    Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations

    No full text
    Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations

    A short-read multiplex sequencing method for reliable, cost-effective and high-throughput genotyping in large-scale studies

    No full text
    Accurate genotyping is important for genetic testing. Sanger sequencing-based typing is the gold standard for genotyping, but it has been underused, due to its high cost and low throughput. In contrast, short-read sequencing provides inexpensive and high-throughput sequencing, holding great promise for reaching the goal of cost-effective and high-throughput genotyping. However, the short-read length and the paucity of appropriate genotyping methods, pose a major challenge. Here, we present RCHSBTreliable, cost-effective and high-throughput sequence based typing pipelinewhich takes short sequence reads as input, but uses a unique variant calling, haploid sequence assembling algorithm, can accurately genotype with greater effective length per amplicon than even Sanger sequencing reads. The RCHSBT method was tested for the human MHC loci HLA-A, HLA-B, HLA-C, HLA-DQB1, and HLA-DRB1, upon 96 samples using Illumina PE 150 reads. Amplicons as long as 950bp were readily genotyped, achieving 100% typing concordance between RCHSBT-called genotypes and genotypes previously called by Sanger sequence. Genotyping throughput was increased over 10 times, and cost was reduced over five times, for RCHSBT as compared with Sanger sequence genotyping. We thus demonstrate RCHSBT to be a genotyping method comparable to Sanger sequencing-based typing in quality, while being more cost-effective, and higher throughput. (C) 2013 Wiley Periodicals, Inc

    De novo assembly of a haplotype-resolved human genome

    No full text
    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine

    An Integrated Tool to Study MHC Region: Accurate SNV Detection and HLA Genes Typing in Human MHC Region Using Targeted High-Throughput Sequencing

    Get PDF
    <div><p>The major histocompatibility complex (MHC) is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb) of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that ∌97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community.</p></div
    corecore